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ABSTRACT 

This  thesis  addresses  the  question  of  whether  a  reliable  multicast  protocol  can  be 
designed  that  enjoys  all  the  scaling  properties  of  receiver-initiated  protocols,  while  still 
being  able  to  operate  correctly  with  finite  memory.  To  answer  this  question,  we  analyze  the 
maximum  throughput  of  the  known  classes  of  reliable  multicast  protocols  that  have  been 
proposed  to  solve  the  acknowledgment  (ack)  implosion  problem  of  sender-initiated  reliable 
multicast  protocols.  We  introduce  a  new  taxonomy  of  reliable  multicast  protocols,  based 
on  the  premise  that  the  mechanisms  used  to  release  data  at  the  source  after  correct  delivery 
should  be  decoupled  from  the  mechanisms  used  to  pace  the  transmission  of  data  and  to 
effect  error  recovery.  Receiver-initiated  protocols,  which  are  based  entirely  on  negative 
acknowledgments  (naks)  sent  from  the  receivers  to  the  sender,  are  shown  to  require  infinite 
buffers  in  order  to  prevent  deadlocks.  Two  other  solutions  to  the  ACK-implosion  problem  are 
tree-based  protocols  and  ring-based  protocols.  The  first  organize  the  receivers  in  a  tree  and 
send  ACKs  along  the  tree;  the  latter  send  ACKs  to  the  sender  along  a  ring  of  receivers.  These 
two  classes  of  protocols  are  shown  to  operate  correctly  with  finite  buffers.  We  show  that 
the  tree-based  protocols  constitute  the  most  scalable  class  of  all  reliable  multicast  protocols 
proposed  to  date. 
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1.  Introduction 

The  increasing  popularity  of  real-time  applications  supporting  either  group  collaboration 
or  the  reliable  dissemination  of  multimedia  information  over  the  Internet  is  making  the 
provision  of  reliable  and  unreliable  end-to-end  multicast  services  an  integral  part  of  its 
architecture.  Although  reliable  broadcast  protocols  have  existed  for  quite  some  time  (e.g., 
see  [1]),  viable  approaches  on  the  provision  of  reliable  multicasting  over  the  Internet  are  just 
emerging.  The  reliable  multicast  problem  facing  the  future  Internet  is  compounded  by  its 
current  size  and  continuing  growth,  which  makes  the  handling  of  acknowledgments  a  major 
challenge  commonly  referred  to  as  the  acknowledgment  (ack)  implosion  problem. 

The  two  most  popular  approaches  to  reliable  multicasting  proposed  to  date  are  called 
sender-initiated  and  receiver-initiated.  In  the  sender-initiated  approach,  the  sender  main¬ 
tains  the  state  of  all  the  receivers  to  whom  it  has  to  send  information  and  from  whom  it  has 
to  receive  acks.  Each  sender’s  transmission  or  retransmission  is  multicast  to  all  receivers; 
for  each  packet  that  each  receiver  obtains  correctly,  it  sends  a  unicast  ack  to  the  sender. 
In  contrast,  in  the  receiver-initiated  approach,  each  receiver  informs  the  sender  of  the  in¬ 
formation  that  is  in  error  or  missing;  the  sender  multicasts  all  packets,  giving  priority  to 
retransmissions,  and  a  receiver  sends  a  negative  acknowledgment  (nak)  when  it  detects  an 
error  or  a  lost  packet. 

The  first  comparative  analysis  of  sender-initiated  and  receiver-initiated  reliable  multicast 
protocols  was  presented  by  Pingali  et  al.  [2,  3].  This  analysis  showed  that  receiver-initiated 
protocols  are  far  more  scalable  than  sender-initiated  protocols,  because  the  maximum 
throughput  of  sender-initiated  protocols  is  dependent  on  the  number  of  receivers,  while 
the  maximum  throughput  of  receiver-initiated  protocols  is  independent  of  the  number 
of  receivers  (when  the  probability  of  packet  loss  is  negligible).  However,  as  this  thesis 
demonstrates,  the  receiver-initiated  protocols  proposed  to  date  cannot  prevent  deadlocks 
when  they  operate  with  finite  memory. 

This  thesis  addresses  the  question  of  whether  a  reliable  multicast  protocol  can  be 


2 


designed  that  enjoys  all  the  scaling  properties  of  receiver-initiated  protocols,  while  still 
being  able  to  operate  correctly  with  finite  memory.  To  address  this  question,  the  previous 
analysis  [2,  3]  is  extended  to  consider  the  maximum  throughput  of  generic  ring- based 
protocols,  and  two  classes  of  tree-based  protocols.  These  classes  are  the  other  three  known 
approaches  that  can  be  used  to  solve  the  ACK  implosion  problem.  Our  analysis  shows 
that  tree-  and  ring-based  protocols  can  work  correctly  with  finite  memory,  that  both  are 
scalable,  and  that  tree-based  protocols  are  the  best  choice  in  terms  of  processing  and  memory 
requirements. 

The  results  presented  in  this  thesis  are  theoretical  in  nature  and  apply  to  generic 
protocols,  rather  than  to  specific  implementations;  however,  we  believe  that  they  provide 
valuable  architectural  insight  for  the  design  of  future  reliable  multicast  protocols.  Chapter  2 
presents  a  new  taxonomy  of  reliable  multicast  protocols  that  organizes  known  approaches 
into  four  protocol  classes  and  discusses  how  many  key  papers  in  the  literature  fit  within 
this  taxonomy.  This  taxonomy  is  based  on  the  premise  that  the  analysis  of  the  mechanisms 
used  to  release  data  from  memory  after  their  correct  reception  by  all  receivers  can  be 
decoupled  from  the  study  of  the  mechanisms  used  to  pace  the  transmission  of  data  within 
the  session  and  the  detection  of  transmission  errors.  Using  this  taxonomy,  we  argue  that 
all  reliable  unicast  and  multicast  protocols  proposed  to  date  that  use  naks  and  work 
correctly  with  finite  memory  use  acks  to  release  memory  and  naks  to  improve  throughput. 
Chapter  3  addresses  the  correctness  of  the  various  classes  of  reliable  multicast  protocols 
introduced  in  our  taxonomy,  showing  that  the  type  of  receiver-initiated  protocols  proposed 
to  date  require  infinite  memory.  Chapter  4  extends  the  analysis  by  Pingali  et  al.  [2,  3]  by 
analyzing  the  maximum  throughput  of  three  protocol  classes:  tree-based,  tree-based  with 
local  NAK-avoidance  and  periodic  polling  (tree-NAPP),  and  ring-based  protocols.  Although 
the  maximum  throughput  of  receiver-initiated,  tree-based1,  and  ring-based  protocols  are 
all  independent  of  the  number  of  receivers  as  the  probability  of  error  goes  to  zero,  we 
1  To  avoid  confusion  comments  on  “receiver-initiated”  and  “tree- based”  protocols  are  inclusive  of  all 
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show  that  only  tree-based  protocols  can  be  completely  scalable  under  any  condition,  i.e. , 
when  the  probability  of  error  is  non-negligible.  Chapter  5  provides  numerical  results  on  the 
performance  of  the  protocol  classes  under  different  scenarios,  and  discusses  the  implications 
of  our  results  in  light  of  recent  work  on  reliable  multicasting.  Chapter  6  provides  concluding 
remarks. 
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2.  Background 

2.1  A  New  Taxonomy  of  Reliable  Multicast  Protocols 

We  now  describe  the  four  generic  approaches  known  to  date  for  reliable  multicasting. 
Well-known  protocols  (for  unicast  and  multicast  purposes)  are  mapped  into  each  class.  Our 
taxonomy  differs  from  prior  work  [3,  2,  4]  addressing  receiver-initiated  strategies  for  reliable 
multicasting  in  that  we  decouple  the  definition  of  the  mechanisms  needed  for  pacing  of  data 
transmission  from  the  mechanisms  needed  for  the  allocation  of  memory  at  the  source.  Using 
this  approach,  the  protocol  can  be  thought  as  using  two  windows:  a  congestion  window  ( cw ) 
that  advances  based  on  feedback  from  receivers  regarding  the  pacing  of  transmissions  and 
detection  of  errors,  and  a  memory  allocation  window  ( mw )  that  advances  based  on  feedback 
from  receivers  as  to  whether  the  sender  can  erase  data  from  memory.  In  practice,  protocols 
may  use  a  single  window  for  pacing  and  memory  (e.g.,  TCP  [5])  or  separate  windows 
(e.g.,  NETBLT  [6]).  It  will  become  apparent  that  this  decoupling  is  critical  in  obtaining 
an  accurate  understanding  of  why  reliable  unicasting  and  multicasting  protocols  scale  and 
work  correctly  with  finite  memory. 

Each  reliable  protocol  assumes  the  existence  of  multicast  routing  tree(s)  that  are  pro¬ 
vided  by  underlying  multicast  routing  protocols.  In  the  internet,  these  trees  will  be  built 
using  such  protocols  as  DVMRP  [7],  Core  Based  Trees  (CBT)  [8]  or  Protocol  Independent 
Multicast  (PIM)  [9]. 

2.2  Sender- Initiated  Protocols 

In  the  past  [2],  sender-initiated  protocols  have  been  characterized  as  placing  the  respon¬ 
sibility  of  reliable  delivery  at  the  sender.  However,  this  characterization  is  overly  restrictive 
and  does  not  reflect  the  way  in  which  several  reliable  multicast  protocols  that  rely  on  posi¬ 
tive  acknowledgments  from  the  receivers  to  the  source  have  been  designed.  In  our  taxonomy, 
a  sender-initiated  reliable  multicast  protocol  is  one  that  requires  the  source  to  receive  acks 
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from  all  the  receivers,  before  it  is  allowed  to  release  memory  for  the  data  associated  with 
the  acks.  It  is  clear  that  the  source  is  required  to  know  the  constituency  of  the  receiver 
set,  and  that  the  scheme  suffers  from  the  ACK-implosion  problem.  However,  this  charac¬ 
terization  leaves  unspecified  the  mechanism  used  for  pacing  of  transmissions  and  for  the 
detection  of  transmission  errors.  Either  the  source  or  the  receivers  can  be  in  charge  of  the 
retransmission  timeouts! 

The  traditional  approach  to  pacing  and  transmission  error  detection  (e.g.,  TCP  in  the 
context  of  reliable  unicasting)  is  for  the  source  to  be  in  charge  of  the  retransmission  timeout. 
However,  as  suggested  by  the  results  reported  by  Floyd  et  al.  [4],  a  better  approach  for  pacing 
a  multicast  session  is  for  each  receiver  to  set  its  own  timeout.  A  receiver  sends  acks  to 
the  source  at  a  rate  that  it  can  accept,  and  sends  a  NAK  to  the  source  after  not  receiving 
a  correct  packet  from  the  source  for  an  amount  of  time  that  exceeds  its  retransmission 
timeout.  An  ack  can  refer  to  a  specific  packet  or  a  window  of  packets,  depending  on  the 
specific  retransmission  strategy. 

Notice  that,  regardless  of  whether  a  sender-based  or  receiver- based  retransmission  strat¬ 
egy  is  used,  the  source  is  still  in  charge  of  deallocating  memory  after  receiving  all  the  acks 
for  a  given  packet  or  set  of  packets.  The  source  keeps  packets  in  memory  until  every  receiver 
node  has  positively  acknowledged  receipt  of  the  data.  If  a  sender-based  retransmission  strat¬ 
egy  is  used,  the  sender  “polls”  the  receivers  for  ACKs  by  retransmitting  after  a  timeout.  If 
a  receiver-based  retransmission  strategy  is  used,  the  receivers  “poll”  the  source  (with  an 
ack)  after  they  time  out1. 

It  is  important  to  note  that,  just  because  a  reliable  multicast  protocol  uses  naks,  it 
does  not  mean  that  it  is  receiver-initiated,  i.e. ,  that  naks  are  the  basis  for  the  source  to 
ascertain  when  it  can  release  data  from  memory.  The  combination  of  ACKs  and  NAKs  has 
been  used  extensively  in  the  past  for  reliable  unicast  and  multicast  protocols.  For  example, 
NETBLT  is  a  unicast  protocol  that  uses  a  NAK  scheme  for  retransmission,  but  only  on  small 
partitions  of  the  data  (i.e.,  its  cw).  In  between  the  partitions,  called  “buffers”  are  acks 
1Of  course,  the  source  still  needs  a  timer  to  ascertain  when  its  connection  with  a  receiver  has  failed. 
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for  all  the  data  in  the  buffer  (i.e. ,  the  mw).  Only  upon  receipt  of  this  ack  does  the  source 
release  data  from  memory;  therefore,  NETBLT  is  really  sender  initiated.  In  fact,  NAKs 
are  unnecessary  in  NETBLT  for  its  correctness,  i.e.,  a  buffer  can  be  considered  one  large 
packet  that  eventually  must  be  ACKed,  and  are  important  only  as  a  mechanism  to  improve 
throughput  by  allowing  the  source  to  know  sooner  when  it  should  retransmit  some  data. 

A  protocol  similar  to  NETBLT  is  the  “Negative  Acknowledgments  with  Periodic  Polling” 
(NAPP)  protocol  [10].  This  protocol  is  a  broadcast  protocol  for  LANs.  Like  NETBLT, 
NAPP  groups  together  large  partitions  of  the  data  that  are  periodically  ACKed,  while  lost 
packets  within  the  partition  are  NAKed.  NAPP  advances  the  cw  by  naks  and  periodically 
advances  the  mw  by  acks.  Because  the  use  of  naks  can  cause  a  NAK -implosion  at  the 
source,  NAPP  uses  a  NAK- avoidance  scheme.  As  in  NETBLT,  naks  increase  NAPP’s 
throughput,  but  are  not  necessary  for  its  correct  operation,  albeit  slow.  The  use  of  periodic 
polling  limits  NAPP  to  LANs,  because  the  source  can  still  suffer  from  an  ACK-implosion 
problem  even  if  ACKs  occur  less  often. 

Other  sender-initiated  protocols,  like  the  Xpress  Transfer  Protocol  (XTP)  [11],  were 
created  for  use  on  an  internet,  but  still  suffer  from  the  ACK  implosion  problem. 

The  main  limitation  of  sender-initiated  protocols  is  not  that  ACKs  are  used,  but  the 
need  for  the  source  to  process  all  of  the  acks  and  to  know  the  receiver  set.  The  two  known 
methods  that  address  this  limitation  are:  (a)  using  naks  instead  of  acks,  and  (b)  delegating 
retransmission  responsibility  to  members  of  the  receiver  set  by  organizing  the  receivers  into 
a  ring  or  a  tree.  We  discuss  both  approaches  subsequently. 

2.3  Receiver-Initiated  Protocols 

Previous  work  [2]  characterizes  receiver-initiated  protocols  as  placing  the  responsibility 
for  ensuring  reliable  packet  delivery  at  each  receiver.  The  critical  aspect  of  these  protocols 
for  our  taxonomy  is  that  no  acks  are  used.  The  receivers  send  naks  back  to  the  source 
when  a  retransmission  is  needed,  detected  by  either  an  error,  a  skip  in  the  sequence  numbers 
used,  or  a  timeout.  Because  the  source  receives  feedback  from  receivers  only  when  packets 
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are  lost  and  not  when  they  are  delivered,  the  source  is  unable  to  ascertain  when  it  can  safely 
release  data  from  memory.  There  is  no  explicit  mechanism  in  a  receiver-initiated  protocol 
for  the  source  to  release  data  from  memory  (i.e. ,  advance  the  mw ),  even  though  its  pacing 
and  retransmission  mechanisms  are  scalable  and  efficient  (i.e.,  advancing  the  cw). 

2.3.1  Receiver- Initiated  Protocols  with  NAK-avoidance 

Because  receivers  communicate  naks  back  to  the  source,  receiver-initiated  protocols  have 
the  possibility  of  experiencing  a  NAK-implosion  problem  at  the  source  if  many  receivers 
detect  transmission  errors.  To  remedy  this  problem,  previous  work  on  receiver-initiated 
protocols  [2,  4]  adopts  the  NAK-avoidance  scheme  first  proposed  for  NAPP,  which  is  a 
sender-initiated  protocol.  Receiver-initiated  with  NAK-avoidance  (RINA)  protocols  have 
been  shown  [2]  to  have  improved  the  performance  over  the  basic  receiver-initiated  protocol. 
The  resulting  generic  RINA  protocol  is  as  follows  [2]:  The  sender  multicasts  all  packets  and 
state  information,  giving  priority  to  retransmissions.  Whenever  a  receiver  detects  a  packet 
loss,  it  waits  for  a  random  time  period  and  then  multicasts  a  NAK  to  the  sender  and  all 
other  receivers.  When  a  receiver  obtains  a  NAK  for  a  packet  that  it  has  not  received  and 
for  which  it  has  started  a  timer  to  send  a  NAK,  the  receiver  sets  a  timer  and  behaves  as  if 
it  had  sent  a  NAK.  The  expiration  of  a  timer  without  the  reception  of  the  corresponding 
packet  is  the  signal  used  to  detect  a  lost  packet.  With  this  scheme,  it  is  hoped  that  only 
one  NAK  is  sent  back  to  the  source  for  a  lost  transmission  for  an  entire  receiver  set.  Nodes 
farther  away  from  the  source  might  not  even  get  a  chance  to  request  a  retransmission.  The 
generic  protocol  does  not  describe  how  timers  are  set  accurately;  in  this  thesis,  we  assume 
perfect  setting  of  timers  because  we  are  interested  in  the  maximum  attainable  throughput 
of  protocols. 

The  generic  RINA  protocol  we  have  just  described  constitutes  the  basis  for  the  operation 
of  the  scalable  reliable  multicasting  (SRM)  algorithm  [4].  SRM  has  been  successfully 
embedded  into  a  internet  collaborative  whiteboard  application  called  wb.  SRM  sets  timers 
based  on  low-rate,  periodic,  “session-messages”  multicast  by  every  receiver.  The  messages 


specify  the  highest  sequence  number  accepted  from  the  source2  and  a  time-stamp  used  by 
the  receivers  to  estimate  the  delay  from  the  source.  The  average  bandwidth  consumed  by 
session  messages  is  kept  small  (e.g.,  by  keeping  the  frequency  of  session  messages  low). 
SRM’s  implementation  requires  that  every  node  stores  all  packets  (a  scheme  could  be  used 
to  support  a  “distributed  memory”)  or  that  the  application  layer  store  all  relevant  data. 
However,  it  is  clear  that  the  sequence  number  in  a  session  message  is  an  ack  to  the  last 
packet  from  the  source,  and  that  a  receiver  can  keep  “polling”  the  source  periodically  to 
ensure  that  the  source  eventually  delivers  missing  packets  not  caught  by  the  NAK  scheme. 
Here  again,  NAKs  are  used  to  advance  the  ctr,  which  is  controlled  by  the  receivers  and 
session  messages  would  be  used  to  advance  the  raw.  We  do  not  place  SRM  within  our 
sender-initiated  category,  because  in  theory  the  source  should  not  know  its  receiver  set. 
In  practice,  the  persistence  of  session  messages  forces  the  source  to  know  the  receiver  set 
over  time.  Unfortunately,  as  defined,  SRM  defeats  one  of  the  goals  of  the  receiver-initiated 
paradigm,  i.e. ,  to  keep  the  receiver  set  anonymous  from  the  source  for  scaling  purposes. 

There  are  other  issues  that  limit  the  use  of  defined  RINA  protocols  such  as  SRM  for 
reliable  multicasting.  First,  SRM  requires  that  data  needed  for  retransmission  be  rebuilt 
from  the  application.  SRM’s  approach  is  reasonable  only  for  applications  in  which  the 
immediate  state  of  the  data  is  exclusively  desired,  which  is  the  case  of  a  distributed 
whiteboard.  However,  the  approach  does  not  apply  for  multimedia  applications  that  have 
no  current  state,  but  only  a  stream  of  transition  states  (e.g.,  a  video  channel). 

Second,  naks  and  retransmissions  must  be  multicast  to  the  entire  multicast  group  to 
allow  suppression  of  naks.  The  NAK-avoidance  was  designed  for  a  limited  scope,  such  as  a 
LAN,  or  the  small  number  of  Internet  nodes  (as  it  is  used  in  tree-NAPP  protocols).  This 
is  because  the  basic  NAK-avoidance  algorithm  requires  that  timers  be  set  based  on  updates 
multicast  by  every  node.  As  the  number  of  nodes  increases,  each  node  must  do  increasing 
amount  of  work!  Even  worse,  nodes  that  are  on  congested  links,  LANs  or  regions  may 
constantly  bother  the  rest  of  the  multicast  group  by  multicasting  NAKs  (often  referred  to 
2 Multiple  sources  are  supported  in  SRM,  we  focus  on  the  single  source. 
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as  the  “crying  baby”  problem). 

Another  example  of  a  receiver-initiated  protocol  is  the  “log-based  receiver- reliable  multi¬ 
cast”  (LBRM)  [12],  which  uses  a  hierarchy  of  log  servers  that  store  information  indefinitely 
and  receivers  recover  by  contacting  a  log  server.  Using  log  servers  is  feasible  only  for  ap¬ 
plications  that  can  afford  the  servers  and  leaves  many  issues  unresolved.  If  a  single  server 
is  used,  performance  can  degrade  due  to  the  load  at  the  server;  if  multiple  servers  are 
used,  mechanisms  must  still  be  implemented  to  ensure  that  such  servers  have  consistent 
information. 

The  ideal  receiver-initiated  protocols  have  three  main  advantages  over  sender-initiated 
protocols,  namely:  (a)  the  source  does  not  know  the  receiver  set,  (b)  the  source  does  not 
have  to  process  every  ACK  from  each  receiver,  and  (c)  the  receivers  pace  the  source.  The 
limitation  of  these  protocols  is  that  they  have  no  mechanism  for  the  source  to  know  when 
it  can  safely  release  data  from  memory.  As  we  have  argued,  known  implementations  of 
the  receiver  initiated  approach  force  nodes  to  know  the  receiver  set,  even  if  by  means  of 
bandwidth-efficient  methods,  or  require  unbounded  storage  at  a  receiver. 

The  following  two  classes  organize  the  receiver  set  in  ways  that  permit  the  strengths  of 
receiver-initiated  protocols  to  be  applied  on  a  local  scale,  while  providing  explicit  mecha¬ 
nisms  for  the  source  to  release  memory  safely  (i.e. ,  efficient  management  of  the  mw ). 

2.4  Tree-Based  Protocols 

Tree-based  protocols  are  characterized  by  dividing  the  receiver  set  into  groups,  distribut¬ 
ing  retransmission  responsibility  over  a  acknowledgment  tree  (ack  tree)  structure.  Without 
loss  of  generality,  our  generic  protocol  definition  assumes  that  each  group  consists  of  no  more 
than  B  children  and  a  group  leader.  Children  are  likely  to  be  the  group  leaders  of  a  sub¬ 
group.  Acknowledgments  from  children  in  a  group,  including  the  sources  own  group,  are 
sent  only  to  the  leader.  A  child  in  a  group  sends  its  acknowledgment  to  its  parent  as  soon  as 
it  receives  a  correct  packet,  not  when  all  its  own  children  (if  any)  have  sent  their  acknowl¬ 
edgments.  Clearly,  these  acknowledgments  differ  from  acks  or  naks  used  in  sender-  and 
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receiver-initiated  protocols  and  we  refer  to  them  as  hierarchical  acknowledgments  (hacks). 
The  use  of  hacks  is  important  for  throughput.  Notice  that,  if  the  source  had  to  wait  for 
acks  to  be  aggregated  all  the  way  from  the  leaf  nodes,  it  would  have  to  be  paced  based  on 
the  slowest  tree  path. 

Tree-based  protocols  delegate  to  leaders  of  subtrees  the  decision  of  when  to  delete  packets 
from  memory,  which  is  conditional  upon  receipt  of  hacks  from  the  children  in  the  group. 
Hacks  are  sent  up  a  H-ary  in-tree  composed  of  up  to  three  types  of  nodes:  a  source  node, 
leaf  nodes,  and  hop  nodes.  The  source  node  is  the  originator  of  a  new  packet,  which  it 
multicasts  to  all  the  receivers,  and  has  at  most  B  children  from  which  to  process  hacks 
and  to  send  retransmissions.  Leaf  nodes  are  at  the  bottom  of  the  tree  and  are  not  responsible 
for  any  children.  They  play  the  same  role  as  receivers  in  the  sender-initiated  protocol,  except 
that  they  send  their  hacks  only  to  their  group  leaders  (hop  nodes)  instead  of  sending  acks 
to  the  source  node.  Hop  nodes  are  group  leaders  in  between  the  source  and  leaf  nodes.  They 
send  hacks  to  their  own  group  leaders  one  step  higher  in  the  tree,  and  they  collect  hacks 
from  the  children  in  their  group,  retransmitting  if  necessary.  They  do  not  release  data 
from  memory  until  all  children  have  acknowledged  correct  transmission.  Obviously  a  tree 
consisting  of  the  source  as  the  only  leader  and  leaf  nodes  corresponds  to  the  sender-initiated 
scheme. 

To  simplify  our  analysis,  we  assume  that  the  source  and  group  leaders  control  the 
retransmission  timeouts;  however,  such  timeouts  can  be  controlled  by  the  children  of  the 
source  and  group  leaders.  Accordingly,  when  the  source  sends  a  packet,  it  sets  a  timer,  and 
each  hop  node  sets  a  timer  as  it  becomes  aware  of  a  new  packet.  If  there  is  a  timeout  before 
all  hacks  have  been  received,  the  packet  is  assumed  to  be  lost  and  is  retransmitted  by  the 
source  or  group  leader  to  its  children.  We  assume  a  selective  repeat  strategy  is  used,  so 
that  once  a  packet  is  received  correctly,  it  is  never  rebroadcast  to  the  group  again.  Because 
out  analysis  focuses  on  maximum  attainable  throughput  of  protocol  classes,  we  will  assume 
that  ACK  tree  perfectly  mirrors  the  routing  tree  created  by  the  underlying  routing  protocol. 

The  first  application  of  tree-based  protocols  to  reliable  multicasting  over  the  internet  was 
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reported  by  Paul  et.  al.  [13],  who  compare  three  basic  schemes  for  reliable  point-to-multipoint 
multicasting  using  hierarchical  structures.  Their  results  have  been  fully  developed  as  the 
reliable  multicast  transport  protocol  (RMTP)  [14].  While  our  generic  protocol  sends  a 
HACK  for  every  packet  sent  by  the  source,  RMTP  sends  hacks  only  periodically,  so  as  not 
to  conserve  bandwidth.  RMTP  has  been  implemented  on  several  platforms  and  has  been 
used  successfully  in  AT&T’s  call  detail  data  distribution  network  [15]. 

Tree-based  protocols  eliminate  the  ACK  implosion  problem  and  free  the  source  from 
having  to  know  the  receiver  set,  work  with  finite  memory,  provide  maximum  end-to-end 
delays  that  are  bounded,  and  operate  solely  on  messages  exchanged  in  local  groups  (between 
a  node  and  its  children  in  the  ack  tree).  As  we  show  in  Chapter  4,  the  amount  of  work 
required  at  each  node  for  tree-based  protocols  does  not  increase  with  the  number  of  group 
members,  i.e. ,  the  throughput  of  such  protocols  is  not  dependent  on  the  number  of  group 
members. 

We  define  a  tree-NAPP  protocol  as  a  tree-based  protocol  that  uses  \AK  avoidance  and 
periodic  polling  [10]  in  the  local  groups.  Naks  alone  are  not  sufficient  to  guarantee  reliability 
with  finite  memory,  so  receivers  send  a  periodic  positive  (hierarchical)  acknowledgment  to 
their  parents  to  advance  the  cw.  Note  that  messages  sent  for  the  setting  of  timers  needed  for 
NAK- avoidance  are  limited  to  the  local  group,  which  is  scalable.  The  tree-based  multicast 
transport  protocol  (TMTP)  [16]  is  the  only  specification  of  a  tree-NAPP  protocol  to  date. 

2.5  Ring-Based  Protocols 

Token-ring  based  protocols  for  reliable  multicast  were  originally  developed  to  provide 
support  for  applications  that  require  an  atomic  and  total  ordering  of  transmissions  at  all 
receivers.  One  of  the  first  proposals  for  reliable  multicasting  is  the  token  ring  protocol 
(TRP)  [1];  its  aim  was  to  combine  the  throughput  advantages  of  NAKs  with  the  reliability 
of  ACKs.  The  Reliable  Multicast  Protocol  (RMP)  [17]  discussed  an  updated  WAN  version 
of  TRP.  Although  multiple  rings  are  used  in  a  naming  hierarchy,  the  same  class  of  protocol 
is  used  for  the  actual  rings.  Therefore,  RMP  has  the  same  throughput  bounds  as  TRP. 
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We  base  our  description  of  generic  ring-based  protocols  on  the  LAN  protocol  TRP  and 
the  WAN  protocol  RMP.  The  basic  premise  is  to  have  only  one  token  site  responsible  for 
ACKing  packets  back  to  the  source.  The  source  times  out  and  retransmits  packets  if  it 
does  not  receive  an  ACK  from  the  token  site  within  a  timeout  period.  The  ACK  also  serves 
to  timestamp  packets,  so  that  all  receiver  nodes  have  a  global  ordering  of  the  packets  for 
delivery  to  the  application  layer.  The  protocol  does  not  allow  receivers  to  deliver  packets 
until  the  token  site  has  multicast  its  ack. 

Receivers  send  naks  to  the  token  site  for  selective  repeat  of  lost  packets  that  were 
originally  multicast  from  the  source.  The  ack  sent  back  to  the  source  also  serves  as  a 
token  passing  mechanism.  If  no  transmissions’  from  the  source  are  available  to  piggyback 
the  token,  then  a  separate  unicast  message  is  sent.  Since  we  are  interested  in  the  maximum 
throughput,  we  will  not  consider  the  latter  case  in  this  thesis.  The  token  is  not  passed  to 
the  next  member  of  the  ring  of  receivers  until  the  new  site  has  correctly  received  all  packets 
that  the  former  site  has  received.  Once  the  token  is  passed,  a  site  may  clear  packets  from 
memory;  accordingly,  the  final  deletion  of  packets  from  the  collective  memory  of  the  receiver 
set  is  decided  by  the  token  site,  and  is  conditional  on  passing  the  token.  The  source  will 
only  delete  packets  when  an  ACK/token  is  received.  Note  that  both  TRP  and  RMP  specify 
that  retransmissions  are  sent  unicast  from  the  token  site.  Because  out  analysis  focuses  on 
maximum  attainable  throughput  of  protocol  classes,  we  will  assume  that  the  token  is  passed 
exactly  once  per  message. 
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3.  Protocol  Correctness 

A  protocol  is  considered  correct  if  it  is  shown  to  be  both  safe  and  live  [18].  For  any 
reliable  multicast  protocol  to  be  considered  safe,  it  must  deliver  all  data  in  order  to  a  higher 
layer  at  the  receivers.  To  be  live,  the  reliable  multicast  protocol  must  never  have  a  case 
where  data  cannot  be  delivered  to  a  higher  layer  at  any  of  the  receivers.  To  address  the 
correctness  of  protocol  classes,  we  assume  that  nodes  never  fail  during  the  duration  of  a 
reliable  multicast  session.  Therefore,  our  analysis  of  correctness  focuses  on  the  ability  of 
the  protocol  classes  to  sustain  packet  losses  or  errors.  We  assume  that  there  exists  some 
non-zero  probability  that  a  packet  is  received  error-free,  and  that  all  senders  and  receivers 
have  finite  memory.  Extensions  of  the  generic  tree-based  protocols  that  ensure  liveness  and 
safety  when  nodes  can  fail  are  discussed  by  Levine,  Lavo,  and  Garcia-Luna-Aceves  [19]. 

The  proof  of  correctness  for  ring-based  protocols  is  given  by  Chang  and  Maxemchuk  [1], 
The  proof  that  sender-initiated  unicast  protocols  are  safe  and  live  is  available  from  many 
sources  (e.g.,  see  Bertsekas  and  Gallager  [18]).  The  proof  does  not  change  significantly 
for  the  sender  initiated  class  of  reliable  multicast  protocols  and  is  omitted  for  brevity.  The 
safety  property  at  each  receiver  is  not  violated,  because  each  node  can  store  a  counter  of  the 
sequence  number  of  the  next  packet  to  be  delivered  to  a  higher  layer.  The  liveness  property 
proof  is  also  essentially  the  same,  because  the  source  waits  for  ACKs  from  all  members  in 
the  receiver  set  before  sliding  the  cw  forward.  Theorems  1  and  2  below  demonstrate  that 
the  generic  tree-based  reliable  multicast  protocol  (TRMP  for  short)  is  correct,  and  that  the 
class  of  receiver-initiated  reliable  multicast  protocols  that  has  been  proposed  is  not  live. 
Theorem  1  TRMP  is  safe  and  live. 

Proof:  Let  R  be  the  set  of  all  the  nodes  that  belong  to  the  reliable  multicast  session, 
including  a  source  s.  The  receivers  in  the  set  are  organized  into  a  B- ary  tree  of  height  h. 
The  proof  proceeds  by  induction  on  h. 

For  the  case  in  which  h  =  1,  TRMP  reduces  to  a  non-hierarchical  sender-initiated 
scheme  of  R  =  B  +  1  nodes,  with  each  of  the  B  receivers  practicing  a  given  retransmission 
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strategy  with  the  source.  Therefore,  the  proof  follows  from  the  correctness  proof  of  unicast 
retransmission  protocols  presented  by  Bertsekas  and  Gallager  [18]. 

For  h  >  1,  assume  the  theorem  holds  for  any  t  such  that  (1  <  t  <  h).  We  must  prove 
the  theorem  holds  for  some  t  —  h. 

Safety:  We  must  prove  that  each  receiver  of  a  tree  of  height  t  delivers  all  data  in  order 
to  a  higher  layer.  Each  one  keeps  a  variable  storing  the  sequence  number  of  the  packet 
to  be  delivered  next.  Only  the  first  error-free  packet  of  that  sequence  received  that  was 
transmitted  by  the  source,  or  retransmitted  by  the  group  leader  is  delivered,  and  then  the 
variable  is  incremented.  This  procedure  is  continued  until  the  session  has  ended.  Therefore, 
TRMP  is  safe. 

Liveness:  We  must  prove  that  each  member  of  a  tree  of  height  t  never  reaches  a  deadlock. 
Consider  a  subset  of  the  tree  that  starts  at  the  source  and  includes  all  nodes  of  the  tree  up 
to  a  height  of  (t  —  1);  the  leaves  of  this  subtree  are  also  hop  nodes  in  the  larger  tree,  i.e. , 
group  leaders  of  the  nodes  at  the  bottom  of  the  larger  tree.  By  the  inductive  hypothesis, 
the  liveness  property  is  true  in  this  subtree.  We  must  only  show  that  TRMP  is  live  for  a 
second  subset  of  nodes  consisting  of  leaves  of  the  larger  tree  and  their  hop  node  parents. 
Each  group  in  this  second  subset  follows  the  same  protocol,  and  it  suffices  to  prove  that  an 
arbitrary  group  is  live. 

The  arbitrary  group  in  the  second  subset  of  the  tree  constitutes  a  case  of  sender- 
initiated  reliable  multicast,  with  the  only  difference  that  the  original  transmission  is  sent 
from  the  source  (external  to  the  group),  not  the  group  leader.  Since  the  availability  of  a 
retransmission  from  a  group  leader  is  guaranteed  by  the  inductive  hypothesis,  each  group 
is  live;  therefore,  the  entire  tree  is  live.  Q£V 
Theorem  2  A  receiver-initiated  reliable  protocol  is  not  live. 

Proof:  The  proof  is  by  example  focusing  on  the  sender  and  an  arbitrary  member  of  the 
receiver  set  R  (where  R  >  1). 

•  Sender  node,  X,  has  enough  memory  to  store  up  to  M  packets. 
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•  Each  packet  takes  1  unit  of  time  to  reach  a  receiver  node  Y.  NAKS  take  a  finite 
amount  of  time  to  reach  the  sender. 

•  Let  pi  denote  the  ith  packet,  i  beginning  from  zero,  po  is  sent  at  start  time  0,  but  it 
is  lost  in  the  network. 

•  X  sends  the  next  (M  —  1)  packets  to  Y  successfully. 

•  Y  sends  a  NAK  stating  that  po  was  not  received.  The  NAK  is  either  lost  or  reaches  the 
sender  after  time  M  when  the  sender  decides  to  send  out  packet  pm- 

•  Since  X  can  only  store  up  to  M  packets,  and  it  has  not  received  any  naks  for  po  by 
time  M,  it  must  clear  po  assuming  that  it  has  been  received  correctly. 

•  X  then  receives  the  NAK  for  po  at  time  M  +  e  and  becomes  deadlocked,  unable  to 
retransmit  po.  QSV 

The  above  indicates  that  the  receiver-initiated  protocols  proposed  to  date  require  an 
infinite  memory  to  work  correctly.  In  practice,  this  requirement  implies  that  the  source  must 
keep  in  memory  every  packet  that  it  sends  during  the  lifetime  of  a  session.  This  becomes 
impractical  in  long-lived  sessions  or  in  sessions  in  which  the  likelihood  of  lost  packets  or 
naks  is  not  negligible.  Fortunately,  the  next  section  shows  that  tree-based  protocols,  which 
we  have  shown  to  work  correctly  with  finite  memory,  provide  all  the  scaling  benefits  of 
receiver  initiated  protocols. 
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4.  Maximum  Throughput  Analysis 

4.1  Assumptions 

To  analyze  the  maximum  throughput  that  each  of  the  generic  reliable  multicast  protocols 
introduced  in  Chapter  2  can  achieve,  we  use  the  same  model  used  by  Pingali  et  al.  [2,  3], 
which  focuses  on  the  processing  requirements  of  generic  reliable  multicast  protocols,  rather 
than  the  communication  bandwidth  requirements.  Accordingly,  the  maximum  throughput 
of  a  generic  protocol  is  a  function  of  the  per-paeket  processing  rate  at  the  sender  and 
receivers,  and  the  analysis  focuses  on  obtaining  the  processing  times  per  packet  at  a  given 
node. 

We  assume  a  single  sender,  X,  multicasting  to  R  identical  receivers.  The  probability  of 
packet  loss  is  p  for  any  node.  Figure  1  summarizes  all  the  notation  used  in  this  chapter. 
For  clarity,  we  assume  a  single  ACK  tree  rooted  at  the  source  in  the  analysis  of  tree-based 
protocols.  A  selective  repeat  retransmission  strategy  is  assumed  in  all  the  protocol  classes 
since  it  is  the  retransmission  strategy  with  the  highest  throughput  [18],  and  its  requirement 
of  keeping  buffers  at  the  receivers  is  a  non-issue  given  the  small  of  cost  memory.  Assumptions 
specific  to  each  protocol  are  listed  in  Chapter  2,  and  are  in  the  interest  of  modeling  maximum 
throughput. 

We  make  two  additional  assumptions:  (1)  all  loss  events  at  any  node  in  the  multicast 
of  a  packet  are  mutually  independent,  and  (2)  no  acknowledgments  are  ever  lost.  Our 
assumptions  clearly  fail  to  model  real  systems  accurately  but  greatly  increase  the  tractability 
of  the  model. 

Such  multicast  routing  protocols  as  CBT,  PIM,  and  DVMRP  [8,  9,  20]  organize  routers 
into  trees,  which  means  that  there  is  a  correlation  between  packet  loss  at  each  receiver.  Our 
first  assumption  is  equivalent  to  a  scenario  in  which  there  is  no  correlation  among  packet 
losses  at  receivers  and  the  location  of  those  receivers  in  the  underlying  multicast  routing 
tree  of  the  source.  We  argue  that  the  results  of  our  analysis  constitute  a  lower  bound  on 
maximum  throughput  for  any  protocol  class  that  can  take  advantage  of  the  relative  position 
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of  receivers  in  the  multicast  routing  tree  for  the  transmission  of  acks  or  naks.  We  have 
not  given  any  class  an  advantage  with  this  assumption. 

Our  second  assumption  benefits  all  classes,  but  especially  favors  protocols  that  multicast 
acknowledgments.  For  example,  NAK-avoidance  is  most  effective  if  all  receivers  are  guaran¬ 
teed  to  receive  the  first  NAK  multicast  to  the  receiver  set.  As  the  number  of  nodes  involved 
in  NAK-avoidance  increases,  the  task  of  successful  delivery  of  a  NAK  to  all  receivers  becomes 
less  probable.  Both  RINA  and  tree-NAPP  protocols  are  favored  by  the  assumption,  but 
RINA  protocols  much  more  so,  because  the  probability  of  delivering  NAKs  successfully  to 
all  receivers  is  exaggerated.  Tree-NAPP  protocols  benefit  from  the  assumption,  but  the 
number  of  receivers  involved  in  the  exchange  of  naks  is  bounded  by  the  size  of  the  local 
groups,  and  therefore  the  advantages  of  assuming  perfect  NAK  transmissions  are  limited. 
Even  with  this  handicap,  our  analysis  shows  that  tree-NAPP  protocols  are  better  than 
RINA  protocols. 

Table  1  summarizes  the  bounds  on  maximum  throughput  for  all  the  known  classes  of 
reliable  multicast  protocols.  Our  results  clearly  show  that  tree-NAPP  protocols  constitute 
the  most  scalable  alternative. 

4.2  Sender-  and  Receiver- Initiated  Protocols 

Following  the  notation  introduced  by  Pingali  et  al.  [2,  3],  we  place  a  superscript  A  on 
any  variable  related  to  the  sender-initiated  protocol,  and  jVl  and  N 2  on  variables  related 
to  the  receiver-initiated  and  RINA  protocols,  respectively.  The  maximum  throughput  of 
the  protocols  for  a  constant  stream  of  packets  to  R  receivers  is  [2]: 

1/A"4  €  o[R{  1  +  y^)),  (4-1) 

1/Am  G  0(  1  +  1^),  (4-2) 

1  /  AiV2  6  0  +  (4.3) 

Even  as  the  probability  of  packet  loss  goes  to  zero,  the  throughput  of  the  sender-initiated 
protocol  is  inversely  dependent  on  R ,  the  size  of  the  receiver  set,  because  an  ack  must  be 
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B 

Branching  factor  of  a  tree,  the  group  size. 

R 

Size  of  the  receiver  set. 

Xf 

Time  to  feed  in  new  packet  from  the  higher  protocol  layer. 

Xp 

Time  to  process  the  time  to  process  the  transmission  of  a  packet. 

Xq,  3  Xn  ,  Xfi 

Times  to  process  transmission  of  a  ACK,  NAK,  or  HACK. 

Xt,Yt 

Time  to  process  a  timeout  at  a  sender  or  receiver  node  respectively. 

YP 

Time  to  process  a  newly  received  packet. 

Yf 

Time  to  deliver  a  correctly  received  packet  to  a  higher  layer. 

Ya,Yn,Yh 

Times  to  process  and  transmit  an  ACK  ,  NAK  ,  or  HACK  respectively. 

P 

Probability  of  loss  at  a  receiver;  losses  at  different  receivers  are  assumed 

to  be  independent  events. 

Li f 

Number  of  HACKS  sent  by  receiver  r  per  packet  using  a  tree-based  protocol. 

LUr 

Number  of  ACKs  sent  by  a  receiver  r  per  packet  using  a  unicast  protocol. 

lh 

Total  number  of  HACKS  received  from  all  receivers  per  packet. 

Mr 

Number  of  transmissions  necessary  for  receiver  r  to  successfully  receive 

a  packet. 

M 

Number  of  transmissions  for  all  receivers  to  receive  the  packet  correctly; 

M  =  max,.  { Mr } 

yw 

Processing  time  per  packet  at  the  sender  and  receiver  respectively 

in  protocol  w  6  {,4,  N1,  N2,  H1,H2,  R}. 

hhi,hH2  - 

Processing  time  per  packet  at  a  hop  node  in  tree-based  and  tree-NAPP 

protocols,  respectively. 

j-iR 

Processing  time  per  packet  at  the  token-site  ring-based  protocols. 

K 

Throughput  for  protocol  w  £  {.1.  /V  1 .  N2.  II 1 .  112.  R}  where  x  is  one  of 

the  source  s,  receiver  (leaf)  r,  hop-node  h,  or  token-site  t.  No  subscript 

denotes  overall  system  throughput. 

X^Y* 

Times  to  process  the  reception  and  transmission,  respectively,  of  a 

periodic  HACK. 

Figure  1:  Notation. 
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protocol 

processor  requirements 

p  as  a 
constant 

p  0 

Sender-initiated  [2] 

O  (fill  +  <££)) 

0{R\n  R) 

O(R) 

Receiver-initiated 
\ A K  avoidance  [2] 

o(i  +  y^) 

0(ln  R) 

0(1) 

Ring-based 
( unicast  retrans.) 

O  (l  + 

O(R) 

0(1) 

Tree-based 

0(B(1  -  p)  +  pBhx  B) 

0(1) 

0(1) 

Tree-NAPP 

Q  ^f  |  |-l-p+plnB+p2(l-4p)^ 

0(1) 

0(1) 

Table  1:  Analytical  bounds. 


sent  by  every  receiver  to  the  source  once  a  transmission  is  correctly  received.  In  contrast, 
as  p  goes  to  zero,  the  throughput  of  receiver-initiated  protocols  becomes  independent  of  the 
number  of  receivers.  Notice,  however,  that  the  throughput  of  a  receiver-initiated  protocol 
is  inversely  dependent  with  R)  the  number  of  receivers,  or  with  In  R)  when  the  probability 
of  error  is  not  negligible. 

4.3  Tree-Based  Protocols 

We  denote  this  class  of  protocols  simply  by  H 1 ,  and  use  that  superscript  in  all  variables 
related  to  the  protocol  class.  In  the  following,  we  derive  and  bound  the  expected  cost  at  each 
type  of  node  and  then  consider  the  overall  system  throughput.  To  make  use  of  symmetry, 
we  assume,  without  loss  of  generality  that  there  are  enough  receivers  to  form  a  full  tree  at 
each  level. 

4.3.1  Source  node 

To  make  use  of  symmetry,  we  will  assume,  without  loss  of  generality  that  there  are 
enough  receivers  to  form  a  full  tree  at  each  level.  We  consider  first  XH] .  the  processing 
costs  required  by  the  source  to  successfully  multicast  an  arbitrarily  chosen  packet  to  all 
receivers  using  t  h e  //  1  protocol.  The  processing  requirement  for  an  arbitrary  packet  can  be 
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expressed  as  a  sum  of  costs: 

XH1  =  (initial  transmission)  +  (retransmissions)  +  (receiving  acks  ) 

M  lh1 

XH1  =  Xf  +  Xp(l)  +  {Xt(m)  +  Xp(m))  +  Xh(i),  (4.4) 

m= 2  i—1 

where  Xj  is  the  time  to  get  a  packet  from  a  higher  layer,  Xp{m )  is  the  time  taken  on 
attempt  rn  at  successful  transmission  of  the  packet,  Xt(m)  is  the  time  to  process  a  timeout 
interrupt  for  transmission  attempt  m,  X^i)  is  the  time  to  process  hack  i ,  M  is  the  number 
of  transmissions  that  the  source  will  have  to  make  for  this  packet,  and  LH1  is  the  number 
of  hacks  received  using  the  HI  protocol.  Taking  expectations,  we  have 

E[Xm]  =  1  [Xf]  +  E [M]  E[XP]  +  (E [M]  -  1)  E[Xt]  +  E [Lm]  E[Xh\.  (4.5) 

What  we  have  derived  so  far  is  extremely  similar  to  Equations  (1)  and  (2)  in  the  analysis 
by  Pingali  et  al.  [2],  In  fact,  we  can  use  all  of  that  analysis,  with  the  understanding  that 
B  is  the  size  of  the  receiver  subset  from  which  the  source  collects  hacks.  Therefore,  the 
expected  number  of  hacks  received  at  the  sender  is 

E  [Lm]  =  E[M](B)(1  -p).  (4.6) 

Substituting  Equation  4.6  into  Equation  4.5,  we  can  rewrite  the  expected  cost  at  the  source 
node  as 


E[Xm]  =  E [Xf]  +  E[M]  +  (E [M]  -  1)  E[A',1  +  E[M]B(1  -  p)  E[Xh\. 

(4.7) 

Because  in  HI  the  number  of  receivers  R  =  B,  the  expected  number  of  transmissions  per 
packet  is  [10,  2] 


E  [M]  =  £ 


B 


(-1) 


!+l 


V 


(1  -  f)  • 


(4.8) 


Pingali  et  al.  [3,  2]  provided  a  bound  of  E [M]  using  the  following  four  equations. 


H  B  , .  ,  H  B 

<  E  [M]  <  1  + 


—  lnp 


—  In  p' 


(4.9) 
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where  H#  =  J2f=i  l/®5  the  harmonic  numbers.  From  the  known  inequality 

>n[1+1,)7Lrh' 

it  follows  that 

l  /  P~  1 
—  lnp  <  - . 

P 

Using  this  and  taking  into  account  that  H#  G  0(ln  B ),  and  by  assuming  all  operations  (e.g. 
Xj  and  Xp)  are  of  constant  cost,  and  given  that  R  =  B  in  H)  it  is  shown  that 

E [M]  eo|l|  In  B^j  .  (4.10) 

Using  Equation  4.10,  we  can  bound  Equation  4.7  as  follows 

E[Xffl]  6  °(fi(l  +  rzf)(l-P)) 

€  0(B(1  —  p)  +  Bpln  B).  (4-11) 

It  then  follows  that  when  p  is  a  constant  E[Xm]  G  0(B  In  B ). 

4.3.2  Leaf  nodes 

Let  YH1  denote  the  requirement  on  nodes  that  do  not  have  to  forward  packets  (leaves). 
Notice  that  leaf  nodes  in  the  HI  protocol  will  process  fewer  retransmissions  and  thus  send 
fewer  acknowledgments  than  receivers  in  the  .4  protocol.  We  can  again  use  an  analysis 
similar  to  the  one  by  Pingali  et  al.  [2]  for  receivers  using  a  sender-initiated  protocol. 

Ym  =  (receiving  transmissions)  +  (sending  hacks  back) 
lh1 

Ym  =  ±(Ys(i)  +  Yh(i))+Yf,  (4.12) 

where  V).  (?)  is  the  time  it  takes  to  process  (re) transmission  i.  Y),  (/)  is  the  time  it  takes  to 
send  hack  i,  Yf  is  the  time  to  deliver  a  packet  to  a  higher  layer,  and  L1^1  is  the  number 
of  hacks  generated  by  this  node  h  (i.e. ,  the  number  of  transmissions  correctly  received). 
Since  each  receiver  is  sent  M  transmissions  with  probability  p  that  a  packet  will  be  lost,  we 
obtain 


E[L»1]  =  E[M](l-p). 


(4.13) 
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Taking  expectations  of  Equation  4.12  and  substituting  Equation  4.13  we  have 

E  [Ym]  =  E[L^](E[Yp]  +  E[Yh])  +  E[Yf] 

=  E[M](1  -  p)  (E \YP]  +  E{Yh])  +  E [Yf].  (4.14) 

Again,  noting  the  bound  of  E[M]  given  in  Equation  4.10, 

E[YH1]eO{l-p  +  p\nB).  (4.15) 

When  p  is  treated  as  a  constant  EjY^1]  6  O(lnU). 


4.3.3  Hop  nodes 

To  evaluate  the  processing  requirement  at  a  hop  node,  h,  we  note  that  a  node  caught 
between  the  source  and  a  node  with  no  children  has  a  two  jobs:  to  receive  and  to  retransmit 
packets.  Because  it  is  convenient,  and  because  a  hop  node  is  both  a  sender  and  receiver, 
we  will  express  the  costs  in  terms  of  X  and  Y.  Our  sum  of  costs  is 


Hm 

Hm 


(receiving  transmissions)  +  (sending  hacks  back) 

+  (collecting  hacks  from  children)  +  (retransmissions  to  children) 


(4.16) 


Just  as  in  the  case  for  the  source  node,  LH1  is  the  expected  number  of  hacks  received  from 
node  h’ s  children  for  this  packet,  and  L1^1  is  the  number  of  hacks  generated  by  node  h. 


E  [Hh1] 


mTmYp]  +  E[WI)  +  E [Yf]  +  (E [M]  -  1)  (E[Ap]  +  E[Xt]) 

+  E[Lm]E[Xh].  (4.17) 


We  can  substitute  Equations  4.6  and  4.13  into  Equation  4.17  to  obtain 

E [Hm]  =  E[M]  (1  -  p)  (E[KP]  +  E[*y )  +  E[VV  +  (E [M]  -  1)  (E[XP\  +  E[Xt]) 


+  BE[M](l-p)  E[Xh\ . 


(4.18) 
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The  first  two  terms  are  equivalent  to  the  processing  requirements  of  a  leaf  node.  The  last 
two  are  almost  the  cost  for  a  source  node.  Substituting  and  subtracting  the  difference  yields 

E [Hm]  =  E[Ym]  +  E[Xm]  -  E[XS]  -  E[XP\.  (4.19) 

In  other  words,  the  cost  on  a  hop  node  is  the  same  as  a  source  and  a  leaf,  without  the 
cost  of  receiving  the  data  from  higher  layers  and  one  less  transmission  (the  original  one). 
Substituting  Equations  4.11  and  4.15  into  4.19  we  have 

E [Hm]  €  0(1  -p  +  p\nB)  U  0(B(1  -  p)  +  Bpln  B) 

€  0(B(1  -  p)  +  Bpln  B).  (4.20) 

When  p  is  a  constant  E [HH1]  (E  0{B  In  B),  which  is  the  dominant  term  in  the  throughput 
analysis  of  the  overall  system. 

4.3.4  Overall  system  analysis 

Let  the  throughput  at  the  sender  A^1  be  1/  EjX^1],  at  the  hop  nodes  A^1  be  1/  E [HH1], 
at  the  leaf  nodes  A^1  be  1  /  R[KW1],  The  throughput  of  the  overall  system  is 


AH1  =  min{ Af  \  Af  \  Af L}.  (4.21) 

From  Equations  4.11,  4.15,  and  4.20  it  follows  that 

l/AH1  eO{B{l-  p)  +  BplnB).  (4.22) 

If  p  is  a  constant  and  if  p  — »  0,  we  obtain 

1/AH1  6  O(BlnB)  =  0(1)  ;  p  constant,  (4.23) 

1/AH1  €  0{B)  =  0(1)  ;  0.  (4.24) 


Therefore,  the  maximum  throughput  of  this  protocol,  as  well  as  the  throughput  with 
non-negligible  packet  loss,  is  independent  of  the  number  of  receivers.  This  is  the  only  class 
of  reliable  multicast  protocols  that  exhibits  such  degree  of  scalability  with  respect  to  the 
number  of  receivers. 
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4.4  Tree-based  Protocols  with  Local  NAK-avoidance  and  Periodic  Polling 

To  bound  the  overall  system  throughput  in  the  generic  Tree-NAPP  protocol,  we  repeat 
the  method  used  for  the  tree-based  class;  we  first  derive  and  bound  the  expected  cost  at 
the  source,  hop,  and  leaf  nodes.  As  we  did  for  the  case  of  tree-based  protocols,  we  assume 
that  there  are  enough  receivers  to  form  a  full  tree  at  each  level.  We  place  a  superscript  H 2 
on  any  variables  relating  to  the  generic  Tree-NAPP  protocol. 

4.4.1  Source  node 

We  consider  first  XH2,  the  processing  costs  required  by  the  source  to  successfully 
multicast  an  arbitrarily  chosen  packet  to  all  receivers  using  the  H 2  protocol.  The  processing 
requirement  for  an  arbitrary  packet  can  be  expressed  as  a  sum  of  costs: 

XH2  =  (initial  transmission)  +  (retransmissions)  +  (receiving  NAKs) 

+  (receiving  periodic  hacks  ) 

M  M 

XH 2  =  Xf  +  J2  MW  +  E  Mm)  +  BM  (4.25) 

i—1  m= 2 

where  Xj  is  the  time  to  get  a  packet  from  a  higher  layer,  Xp(i)  is  the  time  for  (retrans¬ 
mission  attempt  i ,  Xn(m)  is  the  time  for  receiving  NAK  m  from  the  receiver  set,  X $  is 
the  amortized  time  to  process  the  periodic  hack  associated  with  the  current  congestion 
window,  and  M  is  the  number  of  transmissions  attempts  the  source  will  have  to  make  for 
this  packet.  Taking  expectations,  we  have 

E[Xm]  =  E[Xf]  +  E[M]E[Xp]  +  {E[M]-l)E[Xn]  +  BE[X^\.  (4.26) 

Using  Eq.  4.10,  the  bound  of  E[M],  we  can  bound  Eq.  4.26  as  follows 

E[Xm]  €  0(1+1  + ^  B) 

1  —  p 

€  0{l+jM  In  B).  (4.27) 

It  then  follows  that,  when  p  is  a  constant,  E[X^2]  €  0(1). 
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4.4.2  Leaf  nodes 


Let  Ym  denote  the  processing  requirement  on  nodes  that  do  not  have  to  forward  packets 
(leaves).  The  sum  of  cost  can  be  expressed  as 

Ym  =  (receiving  transmissions)  +  (sending  periodic  hacks)  +  (sending  NAKs) 

+  (receiving  NAKs) 

Ym  =  E(i-p)n(i)  +  r/  +  r,  +  E(^  +  (B-i)M) 

Mr- 1 

+  Prob{Mr>  2}  (4.28) 

k= 2 

Let  Yp(i)  be  the  time  it  takes  to  process  the  (re) transmission  i.  Yn(j )  be  the  time  it  takes 
to  send  NAK  j.  Xn(j )  be  the  time  it  takes  to  receiver  NAK  j  (from  another  receiver),  Yt(k ) 
be  the  time  to  set  timer  k.  Yt  be  the  time  to  deliver  a  packet  to  a  higher  layer,  and  Y0  be 
the  amortized  cost  of  sending  a  periodic  hack  for  a  group  of  packets  of  which  this  packet 
is  a  member.  Taking  expectations  of  Eq.  4.28, 

E  [Ym]  =  E{M](l-P)E[Yp\  +  E[Yf]  +  E[Y^  +  (E[M]-l)(^+(B-l)^^ 

+  Prob{Mr  >  2}(E[Mr|Mr  >  2]  -  2)  E[Yt],  (4.29) 

It  follows  from  the  distribution  of  Mr  that  [2] 

E[Mr\Mr  >  2]  =  (3  -  2p) / (1  -  p).  (4.30) 

Therefore,  noting  Eq.  4.30  and  that  Prob{Mr  >  2}  =  p2,  we  derive  from  Eq.  4.29  the 
expected  cost  as 


E[V'"2]  =  E [M] (1  -  p)  E [Yp\  +  E [Yf]  +  E[^j  +  (E [M]  -  1) 


B 


B 


+  P 


3  -  2  p 

1-p 


-2  )E[Yt]. 


Again,  using  the  bound  of  E[M]  given  in  Eq.  4.10,  we  can  bound  Eq.  4.31  by 


E[V'//2]  e  O  1  +  ( 


1  —  P  +  P  In  B  +  p2  (1  —  4p) 

1-p 


(4.31) 


(4.32) 


When  p  is  treated  as  a  constant  E[Y^2]  €  0(1). 
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4.4.3  Hop  nodes 

The  sum  of  costs  for  hop  nodes,  which  has  jobs  of  both  sender  and  receiver  is 


HH2  =  (receiving  transmissions)  +  (sending  periodic  hacks  ) 

+  (receiving  periodic  hacks  )  +  (receiving  naks  )  +  (sending  naks  ) 

+  (retransmissions  to  children) 

Hm  =  (1  -p)J2Yp(i}  +  Y^  +  BX^  +  Yf  +  J2(^  +  (B-l)^^ 

Mr-1  M 

+  Prob {Mr  >  2}  Yt(k)  +  J2  (xn{m)  +  Xp(m)).  (4.33) 

k—2  m= 2 

Taking  expectations  and  substituting  Eq.  4.30,  we  obtain 


nil"  1  =  (1  -  P)  E [M]  E[yp]  +  E [Y+]  +  B  E [X+\  +  E[Yf] 


+  (E [M]  -  1)  (  +  (B  -  1) 


+  (E[M]-1)(E[XJ  +  E[XP]). 


B 


+  ?2Itv7-2)e™ 


(4.34) 


Similar  to  hop  nodes  in  the  HI  protocol,  the  cost  on  at  hop  node  is  the  same  as  a  source  and 
a  leaf,  without  the  cost  of  receiving  the  data  from  higher  layers  and  one  less  transmission. 
Substituting  Eq.  4.26  and  Eq.  4.31  into  Eq.  4.34  and  subtracting  the  difference  the  expected 
cost  can  be  expressed  as 

E  [Hm]  =  E[YH2]  +  E[XH2]-E[Xf]-E[Xp\.  (4.35) 


Therefore  Eq.  4.34  can  be  bounded  by 


E[Hm] 


€  0{E[YH2])  U  0{ E[XH2]) 
e  qL  ,  ,l-p  +  pln£  +  p2(l-4j?) 
V  1  -p 


(4.36) 


When  p  is  a  constant  E[HH2]  6  0(1).  Therefore,  all  nodes  in  the  Tree-NAPP  protocol  have 
a  constant  amount  of  work  to  do  with  regard  to  the  number  of  receivers. 


4.4.4  Overall  system  analysis 


The  overall  system  throughput  for  the  H2  protocol  is  the  minimum  throughput  attain¬ 
able  at  each  type  of  node  in  the  tree: 

Am  =  min{ Af 2,  Af 2,  Af 2}.  (4.37) 

From  Equations  4.27,  4.32,  and  4.36,  it  follows  that 

,lf(n  /  ,  1  —  p  +  pin  B  +  u2(l  —  Ap)  \ 

l/Afl2€0(l  +  ( - - -  l-  p  - — ))■  (4-38) 

Accordingly,  if  either  p  is  constant  or  p  — »  0,  we  obtain  from  Eq.  4.38 

1/Am  €  0(1) 

Therefore,  the  maximum  throughput  of  the  Tree-NAPP  protocol,  as  well  as  the  through¬ 
put  with  non-negligible  packet  loss,  is  independent  of  the  number  of  receivers.  In  Chapter  5 
the  exact  equations  of  throughput  of  all  classes  are  compared,  and  it  is  shown  that  Tree- 
NAPP  protocols  attain  the  highest  throughput,  according  to  our  model. 

4.5  Ring-Based  Protocols 

In  this  section  we  analyze  the  throughput  of  ring-based  protocols,  which  we  denote  by 
a  superscript  R ,  using  the  same  assumptions  as  in  sections  4.3  and  4.4.  Because  we  are 
assuming  a  constant  stream  of  packets,  we  will  ignore  the  overhead  that  occurs  when  there 
are  no  acks  on  which  to  piggyback  token-passing  messages. 

4.5.1  Source 

Source  nodes  practice  a  special  form  of  unicast  with  a  roaming  token  site.  The  sum  of 
costs  incurred  is 

XR  =  (initial  transmission)  +  (processing  acks)  +  (retransmissions) 
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where  Mr  is  the  number  of  transmissions  required  for  the  packet  to  be  received  by  the  token 
site,  and  has  a  mean  of  E[Mr]  =  1/(1— p);  and  let  /./  be  the  number  of  acks  from  a  receiver 
r  (in  this  case  the  token  site)  sent  unicast ,  i.e. ,  the  number  of  packets  correctly  received 
at  r.  This  number  is  always  1,  accordingly: 

L1'  =E[Mr]{l-p)  =  1.  (4.40) 

Taking  expectations  of  Eq.  4.39,  we  obtain 

E[XR]  =  E[Xf]  +  E [Mr]  E[Xp]  +  (E[Mr]  -  1)  E[Xt]  +  E[/,';]  E[Xa] 

=  E [Xf]  +  -L-  E[XP]  +  E[Xt]  +  E[Xa].  (4.41) 

1  —  p  1  —  p 

If  we  again  assume  constant  costs  for  all  operations,  it  can  be  shown  that 

E[Xfi]  €  O  )  ,  (4.42) 

which,  when  p  is  a  constant,  is  0(1)  with  regard  to  the  size  of  the  receiver  set. 

4.5.2  Token  site 

The  current  token  site  has  the  following  costs:  (Note  both  TRP  and  RMP  specify  that 
retransmissions  are  sent  unicast  to  other  ft  —  1  receivers.) 

Tr  =  (receiving  transmission)  +  (multicasting  ACK/token  ) 

+  (processing  naks  )  +  (unicasting  retransmissions) 

L?  ,  \  RR  Mr 

TR  =  Yf  +  J2  [Yp(i)  +n(*)J  +  E  xn(j)  +  {R-  l)Prob {Mr  >  1}  E  Xp(™U 4-43) 

where  LR  is  the  number  of  NAKs  received  at  the  token  site  when  using  a  ring  protocol.  To 
derive  /. R .  consider  Mr,  the  number  of  transmissions  necessary  for  receiver  r  to  successfully 
receive  a  packet.  Mr  has  an  expected  value  of  1/(1  —  p),  and  the  last  transmission  is  not 
NAKed.  Because  there  are  (ft  —  1)  other  receivers  sending  naks  to  the  token  site,  we  obtain 

(ft  -  l)p 


E [Lr]  =  (ft  -  l)(E[Mr]  -  1) 


1  -p 


(4.44) 
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Therefore,  the  mean  processing  time  at  the  token  site  is 

E[TR]  =  E [Yf]  +  E [Yp]  +  E[Ya]  +  E [LR]  E[Xn]  +  ( R  -  l)p  E[Mr]  E[XP\ 

=  E  [Yf]  +  E[YP]  +  E[Ya]  +  (E[Xn]  +  B[Xp]j .  (4.45) 

The  expected  cost  at  the  token  site  can  be  bounded  by 

e['/  eo(i+  i  (4.46) 

with  regard  to  the  number  of  receivers.  When  p  is  a  constant,  E[7  ,’]  G  O(R). 

4.5.3  Receivers 

Receivers  practice  a  receiver-initiated  protocol  with  the  current  token  site.  We  assume 
there  is  only  one  packet  for  the  ack,  token,  and  timestamp  multicast  from  the  token  site 
per  data  packet.  The  cost  associated  with  an  arbitrary  packet  are  therefore 

YR  =  (receiving  ACK/token/timestamp)  +  (receiving  first  transmission) 

+  (sending  NAKs  )  +  (receiving  retransmissions) 

L? 

YR  =  Ya  +  Prob {Mr  =  1}YP(1)  +  Yf  +  Prob {Mr  >  1} ^  Yp{i) 

i=  1 

Mr  Mr 

+  Prob{Mr  >  1}  Yn(m )  +  Prob{Mr  >  2}  ^  Yt{n).  (4.47) 

m— 2  n= 3 

The  above  equation  is  complicated,  and  each  term  needs  to  be  explained.  The  first  term 
is  the  cost  of  receiving  the  ACK/token/timestamp  packet  from  the  token  site;  the  second 
is  the  cost  of  receiving  the  first  transmission  sent  from  the  sender,  assuming  it  is  received 
error  free;  the  third  is  the  cost  of  delivering  an  error-free  transmission  to  a  higher  layer;  the 
fourth  is  the  cost  of  receiving  the  retransmissions  from  the  token  site,  assuming  that  the 
first  failed;  and  the  last  two  terms  consider  that  a  NAK  is  sent  only  if  the  first  transmission 
attempt  fails  and  that  an  interrupt  occurs  only  if  a  NAK  was  sent.  Taking  expectations,  we 
obtain 


E [Yr]  =  E[Ya]  +  (1  -  p)  E[Yp]  +  E[V>]  +  p  E[/f  ]  E[V'r] 

+  p(E[Mr\Mr  >  1]  -  1)  E[K„]  +  p2 (E[Mr \Mr  >  2]  -  2)  E[Y,].  (4.48) 


As  shown  previously  [2] 
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E[Mr\Mr  >  1]  = 


2 -p 
1  ~P 


(4.49) 


and  as  stated  in  the  previous  section 


E[Mr\Mr  >  2]  = 


3  -  2  p 

1  -  p 


(4.50) 


Substituting  Equations  4.40,  4.49,  and  4.50  into  Equation  4.48  we  have 


E [YR]  =  E [Xa]  +  (1  -  p)  E[YP]  +  E[Yf]  +  p  E[YP]  +  -J!—  ( E[Yn]  +  p  E[Yt]  j .  (4.51) 


P  \ 

Assuming  all  operations  have  constant  costs,  it  can  be  shown  that 

'l+p2\ 


E[Yfi]  €  O 


1  -p 


(4.52) 


with  regard  to  the  size  of  the  receiver  set.  If  we  consider  p  as  a  constant,  then  E[YR]  €  0(1) 


4.5.4  Overall  system  analysis 

The  overall  system  throughput  of  R ,  the  generic  token  ring  protocol,  is  equal  to  the 
minimum  attainable  throughput  at  each  of  its  parts: 

Ar  =  min{ Af ,  Xf ,  Af }.  (4.53) 

From  Equations  4.42,  4.46  and  4.52  if  follows  that  if  p  is  a  constant  and  for  p  — »  0,  we 
obtain 

1/XR  gO  |l|  | ^ j  ;  p  constant,  (4.54) 

1/XR  6  0(1)  ;  p->  0.  (4.55) 

When  p  — »  0,  the  maximum  throughput  of  this  class  of  protocols  is  0(1)  and  not  dependent 
of  the  number  of  receivers. 
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5.  Numerical  Results 

To  compare  the  relative  performance  of  the  various  classes  of  protocols,  all  mean  processing 
times  are  set  equal  to  1,  except  for  the  periodic  costs  X $  and  which  are  set  to  0.1. 
Figure  2  compares  the  relative  throughputs  of  the  protocols  A,  Nl,  N 2,  HI,  H 2,  and  R 
as  defined  in  Chapter  2.  The  graph  represents  the  inverse  of  Equations  4.18,  4.34,  and 
4.45,  respectively,  which  are  the  throughputs  for  the  tree-based,  tree-NAPP,  and  ring-based 
protocols,  as  well  as  the  inverse  of  the  throughput  equations  derived  previously  [2]  for 
sender-  and  receiver-initiated  protocols.  The  top,  middle  and  bottom  graphs  correspond  to 
increasing  probabilities  of  packet  loss,  1%,  10%,  and  25%,  respectively. 

The  performance  of  NAK-avoidance  protocols,  especially  tree-NAPP  protocols,  is  clearly 
superior.  However,  our  assumptions  place  these  two  sub-classes  at  an  advantage  over  their 
base  classes.  First,  we  assume  that  no  acknowledgments  are  lost  or  are  received  in  error. 
The  effectiveness  of  NAK-avoidance  is  dependent  on  the  probability  of  NAKs  reaching  all 
receivers,  and  thus,  without  our  assumption,  the  effectiveness  of  NAK-avoidance  decreases 
as  the  number  of  receivers  involved  increases.  Accordingly,  tree-NAPP  protocols  have  an 
advantage  that  is  limited  by  the  branching  factor,  and  RINA  protocols  have  an  advantage 
that  increases  with  the  size  of  the  entire  receiver  set. 

Second,  we  assume  that  the  timers  used  for  NAK-avoidance  are  set  perfectly.  In  reality 
the  messages  used  to  set  timers  would  be  subject  to  end-to-end  delays  that  exhibit  no 
regularity  and  can  become  arbitrarily  large. 

We  conjecture  that  the  relative  performance  of  NAK-avoidance  subclasses  would  actually 
lie  closer  to  their  respective  base  classes,  depending  on  the  effectiveness  of  the  NAK-avoidance 
scheme;  in  other  words,  the  curves  shown  are  upper  bounds.  Our  results  show  that  when 
considering  only  the  base  classes  (since  not  one  has  an  advantage  over  another)  the  tree- 
based  class  performs  better  than  all  the  other  classes.  When  considering  only  the  sub-classes 
that  use  NAK-avoidance,  tree-NAPP  protocols  perform  better  than  RINA  protocols,  even 
though  our  model  provides  an  unfair  advantage  to  RINA  protocols. 


Figure  2:  The  throughput  graph  from  the  exact  equations  for  each  protocol.  The 
probability  of  packet  loss  is  1%,  10%,  and  25%  respectively.  The  branching  factor 
for  trees  is  set  at  10. 

It  is  the  hierarchical  structure  organization  of  the  receiver  set  in  tree-based  protocols 
that  guarantees  scalability  and  improves  performance  over  other  protocols.  Using  NAK- 
avoidance  on  a  small  scale  increases  performance  further.  In  addition,  if  NAK-avoidance 
failed  for  a  tree-NAPP  protocol,  the  performance  would  still  be  independent  of  the  size  of 
the  receiver  set.  RINA  protocols  do  not  have  this  property.  Failure  of  the  NAK-avoidance 
for  RINA  protocols  would  results  in  unscalable  performance  like  that  of  a  receiver-initiated 
protocol,  which  degrades  quickly  with  increasing  packet  loss. 


Figure  3:  Number  of  supportable  receivers  for  each  protocol.  The  probability  of 
packet  loss  is  1%,  10%,  and  25%  respectively.  The  branching  factor  for  trees  is  set 
at  10. 

Any  increase  in  processor  speed,  or  a  smaller  branching  factor  would  also  increase 
throughput  for  all  tree-based  protocols.  However,  for  the  same  number  of  receivers,  a 
smaller  branching  factor  implies  a  larger  number  of  tree-hops  some  retransmissions  must 
traverse  to  receivers  expecting  them  further  down  the  tree.  For  example,  if  a  packet  is  lost 
immediately  at  the  source,  the  retransmission  is  multicast  only  to  its  children  and  all  other 
nodes  in  the  tree  must  wait  until  the  retransmission  trickles  down  the  tree-structure.  This 
poses  a  latency  problem  that  can  be  addressed  by  taking  advantage  of  the  dependencies  in 
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the  underlying  multicast  routing  tree.  Retransmissions  could  be  multicast  only  toward  all 
receivers  attached  to  routers  on  the  subtree  of  the  router  attached  to  the  receiver  which 
has  requested  the  missing  data.  However,  to  date  there  is  no  proposed  scheme  which 
accomplishes  this  task.  The  number  of  tree  hops  from  the  receiver  to  the  source  is  also 
a  factor  in  how  quickly  the  source  can  release  data  from  memory  in  the  presence  of  node 
failures,  as  discussed  by  Levine,  Lavo,  and  Cfarcia-Luna-Aceves  [19], 

Figure  3  shows  the  number  of  supportable  receivers  by  each  of  the  different  classes, 
relative  to  processor  speed  requirements.  This  number  is  obtained  by  normalizing  all  classes 
to  a  baseline  processor,  as  described  by  Pingali  et  al.  [2,  3].  The  baseline  uses  protocol  A 
and  can  support  exactly  one  receiver;  if  fj,w[R],u  G  {A,  iVl,  N2,H1,  H2,  R}  is  the  speed  of 
the  processor  that  can  support  at  most  R  receivers  under  protocol  w,  we  set  /r4[l]  =  1.  The 
baseline  cost  is  equal  to  [2,  3] 


k[a'  + 


1  3  —  p  3  —  p 


(5.1) 


\R=1  1  -V 

Using  Equations  5.1,  4.17,  4.34,  and  4.45  we  can  derive  the  following  /ds  for  tree-based, 
tree-NAPP,  and  ring-based  protocols,  respectively: 


^[R]  = 

=  g^j(E[M](l  -  p)  (2)  +  1  +  (E [M]  -  1)  (2)  +  BE[M](1  -  p)) 

=  1  l-E[.V/](4+/l  (2T/l)p)),  (5.2) 


pm[R] 


i[kE[^2] 

E^j((4  -  p)  m\  -  1.9  +  0.1  B  +  P2  -  2) , 


(5.3) 


pR[R]  = 


1 


E  [Tr] 


E[XA] 

1 

E[X-4] 

1 


(fl-1  )P 
(!  ~P) 

2  (R-  1  )p' 

(!  -P) 


3  + 


(1  +  1) 


E[XA] 


(5.4) 
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The  number  of  supportable  receivers  derived  for  sender-  and  receiver-initiated  protocols  are 
shown  to  be  [2,  3], 

A A[R]  =  ^jE[M](2+R(1-p)), 

>lN1[R]  =  E^\{1  +  E[M]  +  RP/{1~P))^ 

»N2m  =  E^3j(2E[M]). 

Because  the  exact  value  of  E[M]  is  difficult  to  compute  for  large  values  of  R ,  we  use  the 
approximation  [2,  3], 

EM  *  «  +  {Hl~{^R) •  <5-5) 

where  a  is  the  value  of  E [M]  for  R  =  35  and  Hu  is  the  harmonic  series.  When  evaluating 
jIil[R\  and  fiH2[R],  an  exact  value  for  E [M]  is  used  because  the  number  of  receivers  is 
always  R  =  B  =  10. 

From  Figure  3,  it  is  clear  that  only  the  tree-based  classes  can  support  any  number  of 
receivers  for  the  same  processor  speed  bound  at  each  node.  It  is  also  clear  that,  in  terms  of 
performance,  tree-NAPP  protocols  are  superior  to  other  classes. 

Because  of  the  unicast  nature  of  retransmissions  in  ring-based  protocols,  these  protocols 
approach  sender-initiated  protocols;  this  indicates  that  allowing  only  multicast  retransmis¬ 
sions  would  improve  performance  greatly. 
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6.  Conclusions 

We  have  compared  and  analyzed  the  four  known  classes  of  reliable  multicast  proto¬ 
cols.  The  results  are  summarized  in  Table  1.  It  is  already  known  that  sender-initiated 
protocols  are  not  scalable  at  all  since  the  source  must  account  for  every  receiver  listening. 
Receiver-initiated  protocols  are  more  scalable,  especially  when  NAK-suppression  schemes 
are  used  to  avoid  overloading  the  source  with  retransmission  requests.  However,  because  of 
the  unbounded-memory  requirement,  this  protocol  class  can  only  be  used  efficiently  with 
application-layer  support,  and  only  for  limited  applications.  Ring-based  protocols  were  de¬ 
signed  for  atomic  and  total  ordering  of  packets.  TRP  and  RMP  limit  their  throughput 
by  requiring  retransmissions  to  be  unicast.  It  would  be  possible  to  reduce  the  cost  bound 
to  0(ln/?),  assuming  p  to  be  a  constant,  if  the  NAK-avoidance  techniques  presented  by 
Ramakrishnan  and  Jain  [10]  were  used. 

Our  analysis  shows  that  trees  are  the  answer  to  the  scalability  problem  for  reliable 
multicasting.  Only  tree-based  and  tree-NAPP  classes  have  a  throughput  that  is  constant 
with  respect  to  the  number  of  receivers  even  when  the  probability  of  packet  loss  is  not 
negligible.  Furthermore,  our  model  predicts  tree-NAPP  protocols  as  the  best  method  for 
supporting  reliable  multicast. 

Of  course,  our  model  constitutes  only  a  crude  approximation  of  the  actual  behavior  of 
reliable  multicast  protocols.  In  the  Internet,  an  ack  or  a  NAK  is  simply  another  packet,  and 
the  probability  of  an  ack  or  NAK  being  lost  or  received  in  error  is  much  the  same  as  the 
error  probability  of  a  data  packet.  This  assumption  gives  protocols  that  use  NAK-avoidance 
an  advantage  over  over  other  classes.  Therefore,  it  is  more  reasonable  to  compare  them  sep¬ 
arately:  our  results  show  that  tree-based  protocols  without  NAK-avoidance  perform  better 
than  other  classes  that  do  not  use  NAK  avoidance,  and  that  tree-NAPP  protocols  perform 
better  than  RINA  protocols  even  though  RINA  protocols  have  an  artificial  advantaged  over 
every  other  class. 

We  conjecture  that,  once  the  effect  of  ACK  or  NAK  failure  is  accounted  for,  the  same 
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relative  performance  of  protocols  that  do  not  use  NAK  avoidance  will  be  seen.  Furthermore, 
we  believe  the  true  performance  of  tree-NAPP  and  RINA  protocols  will  lie  closer  to  their 
respective  base  classes,  depending  on  the  effectiveness  of  the  NAK-avoidance  scheme;  in 
other  words,  the  curves  shown  for  NAK-avoidance  protocols  are  upper  bounds. 

The  fact  that  packet  failures  are  correlated  along  a  multicast  routing  trees  setup  by 
CBT  or  PIM  means  that  our  model’s  assumption  of  independent  packet  failures  leads  to 
lower  bounds  on  the  maximum  throughput,  because  reliable  multicast  protocols  can  take 
advantage  of  the  structure  of  the  underlying  multicast  routing  tree.  Our  analysis  provides 
no  advantage  to  any  class  however,  and  we  believe  the  relative  performances  would  not 
change  if  we  did  not  make  this  assumption. 

Because  tree-based  protocols  delegate  responsibility  for  retransmission  to  receivers  and 
because  they  employ  techniques  applicable  to  either  sender-  or  receiver-initiated  protocols 
within  local  groups  (i.e. ,  a  node  and  its  children  in  the  tree)  of  the  ACK  tree  only,  any 
mechanism  that  can  be  used  in  a  receiver-initiated  protocol  can  be  adopted  in  a  tree-based 
protocol,  with  the  added  benefit  that  the  throughput  and  number  of  supportable  receivers 
is  completely  independent  of  the  size  of  the  receiver  set,  regardless  of  the  likelihood  with 
which  packets  are  received  correctly  at  the  receivers.  Based  on  these  results,  our  future 
work  focuses  on  developing  new  tree-based  protocols  for  scalable  reliable  multicasting  in 
the  Internet  [19]. 
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